108
Binary Neural Architecture Search
12JO\KXMKTIK
෨ࢌࡼሺܟǡ ߙሻ
෪
ࢌ܊
ሺෝܟǡ ොߙǡ ߚሻ
6GXKTZ
)NORJ
9KGXIN9VGIK
:GTMKTZ
VXUVGMGZOUT
ࡸࡾ
)XUYYKTZXUV_
ߙ՚ ොߙ
*KIU[VRKJ
UVZOSO`GZOUT
*KIU[VRKJ
UVZOSO`GZOUT
FIGURE 4.12
The main framework of the Discrepant Child-Parent model. In orange, we show the critical
novelty of DCP-NAS, i.e., tangent propagation and decoupled optimization.
architectures with binarized weights and activations, which consider both real-valued archi-
tectures and binarized architectures.
4.4.3
Search Space
We search for computation cells as the building blocks of the final architecture. As in
[305, 307, 151] and Fig. 4.13, we construct the network with a predefined number of cells, and
each cell is a fully connected directed acyclic graph (DAG) G with N nodes. For simplicity,
we assume that each cell only takes the outputs of the two previous cells as input, and
each input node has pre-defined convolutional operations for preprocessing. Each node j is
obtained by
a(j) =
i<j
o(i,j)(a(i))
o(i,j)(ai) = w(i,j) ⊗ai,
(4.27)
where i is the dependent nodes of j with the constraints i < j to avoid cycles in a cell,
and aj is the output of the node j. w(i,j) denotes the weights of the convolution operation
between the i-th and j-th nodes, and ⊗denotes the convolution operation. Each node is a
specific tensor like a feature map, and each directed edge (i, j) denotes an operation o(i,j)(.),
which is sampled from the following M = 8 operations:
FIGURE 4.13
The cell architecture for DCP-NAS. One cell includes 2 input nodes, 4 intermediate nodes,
and 14 edges.